-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review #2412
Closed
Closed
Review #2412
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- fix 404s due to openvino link structure change - 2023.3 -> 2024 where neccessary - spelling fixes
CVS-135106 --------- Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
* validate class and execute method existence, extend pyovms.Tensor constructor, fix finalize not called issue, print with flush in demos
Fixed bugs in capi benchmark app, documented and created demo showcasing benchmark app features
--------- Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
* Resolve python node todo's
* Dump to file flags
smart building depending on the content parallel tests execution build performance optimization
By having a verbose flag, it creates ~67,000 lines of messages in the build logs just for unpacking the boost tar file. This makes it challenging to audit the build process.
…change (#2393) CVS-136795
* Allow flag injection to pugixml This commit contains a patch that adds the variables for the CXX and linker flags to the CMakeLists.txt file. It then uses the patch during build so that later we can inject build flags on the cmake command. * exclude header check * fix dockerfile sequence * set ubi as the default base image --------- Co-authored-by: Steve Grubb <ausearch.1@gmail.com>
* Add string output demo
…MediaPipe stream info (#2395)
* Add support of _contents fields in KServe request input for mediapipe for all deserialization paths --------- Co-authored-by: atobisze <adrian.tobiszewski@intel.com>
* Fixing references * Fix internal link
* universal_and_benchmark_documentation_updates * no proxy update * update benchmark proxy * add version to ubuntu tag * revert ubuntu changes * added localhost * review
* dockerfile for gradio * monitoring changes in the documents scope * preinstall nltk modules * default security context set to ovms account * improvements in rag demo
CVS-138032 Implementation of /v3/chat/completions endpoint and forwarding the HTTP message to MediaPipe graph. The data is std::string now, to be adjusted in following tasks (CVS-139240/CVS-140684).
* CVS-137992_fix_deadline_exceeded_dg2 * add retry for get_model_metadata_request * add get_model_metadata function * fix test names * increase timeout for GetModelStatus
https://jira.devtools.intel.com/browse/CVS-139240 Implementation of chat completion request conversion to HttpPayload struct.
* Fix ovms status to http status conversion
* add-version-to-ubuntu-os * fix ovms_pkg link * BASE_OS_DISTRO * ovms_pkg os * updates * DIST_OS added * adjust nginx build * fix nginx * Update Makefile Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com> * Update Makefile Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com> --------- Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>
CVS-139231/CVS-139233 This introduces LLM calculator that accepts HTTP OpenAI /v3/chat/completion requests and produces compliant responses. Working in both - unary and streaming modes. Bunch of parameters are still marked as TODO, but should be enough to perform benchmarks. Minimal demo description how to run.
* Add scheduler config in graph options * Fix centos stream-8
--------- Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com> Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>
CVS-142768 Forwards beam search and multinomial sampling parameters to CB library - this enables returning more than 1 completion for beam search (only for unary) Adds profiling traces (minitrace)
* Add UTs for llm request conversion
* fix tbb handling for ubuntu20
There is an issue (or feature?) that adding generated token to the token cache produces shorter message than previous without newly generated one. TextStreamer did not expect such behavior. The fix ignores such event and makes the generation wait for the next tokens. + reducing number of response chunks by adding requirement so that chunk needs to include space in order to send cache to the client
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.